Using Transliteration of Proper Names from Arabic to Latin Script to Improve English-Arabic Word Alignment

نویسندگان

  • Nasredine Semmar
  • Houda Saadane
چکیده

Bilingual lexicons of proper names play a vital role in machine translation and cross-language information retrieval. Word alignment approaches are generally used to construct bilingual lexicons automatically from parallel corpora. Aligning proper names is a task particularly difficult when the source and target languages of the parallel corpus do not share a same written script. We present in this paper a system to transliterate automatically proper names from Arabic to Latin script, and a tool to align single and compound words from English-Arabic parallel texts. We particularly focus on the impact of using transliteration to improve the performance of the word alignment tool. We have evaluated the word alignment tool integrating transliteration of proper names from Arabic to Latin script using two methods: A manual evaluation of the alignment quality and an evaluation of the impact of this alignment on the translation quality by using the open source statistical machine translation system Moses. Experiments show that integrating transliteration of proper names into the alignment process improves the Fmeasure of word alignment from 72% to 81% and the translation BLEU score from 20.15% to 20.63%.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Study of the impact of proper name transliteration on the performance of word alignment in French-Arabic parallel corpora (Etude de l'impact de la translittération de noms propres sur la qualité de l'alignement de mots à partir de corpus parallèles français-arabe) [in French]

Bilingual lexicons play a vital role in cross-language information retrieval and machine translation. The manual construction of these lexicons is often costly and time consuming. Word alignment techniques are generally used to construct bilingual lexicons from parallel texts. Aligning single words and nominal syntagms from parallel texts is relatively a well controlled task for languages using...

متن کامل

Combining probability models and web mining models: a framework for proper name transliteration

The rapid growth of the Internet has created a tremendous number of multilingual resources. However, language boundaries prevent information sharing and discovery across countries. Proper names play an important role in search queries and knowledge discovery. When foreign names are involved, proper names are often translated phonetically which is referred to as transliteration. In this research...

متن کامل

Translating Names and Technical Terms in Arabic Text

It is challenging to translate names and technical terms from English into Arabic. Translation is usually done phonetically: different alphabets and sound inventories force various compromises. For example, Peter Streams may come out as hr..~ ~ bytr szrymz. This process is called transliteration. We address here the reverse problem: given a foreign name or loanword in Arabic text, we want to re...

متن کامل

Cross Linguistic Name Matching in English and Arabic

This paper presents a solution to the problem of matching personal names in English to the same names represented in Arabic script. Standard string comparison measures perform poorly on this task due to varying transliteration conventions in both languages and the fact that Arabic script does not usually represent short vowels. Significant improvement is achieved by augmenting the classic Leven...

متن کامل

English-Arabic Transliteration

Proper nouns may be considered as the most important query words in information retrieval. If the two languages use the same alphabet, the same proper nouns can be found in either language. However, if the two languages use different alphabets, the names must be transliterated. Short vowels are not usually marked on the Arabic words in almost all Arabic documents (except very important document...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013